- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 424
Add ability to query columns and upload tables #3403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Codecov Report❌ Patch coverage is  
 Additional details and impacted files@@            Coverage Diff             @@
##             main    #3403      +/-   ##
==========================================
+ Coverage   70.72%   70.85%   +0.13%     
==========================================
  Files         232      232              
  Lines       20041    20096      +55     
==========================================
+ Hits        14174    14240      +66     
+ Misses       5867     5856      -11     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zoghbi-a - Thanks for the PR, I have some review comments but this should be close to be mergeable.
The most substantial one where I would also ask Adam's insight is the naming of the new method as we have and can have in the future a similar functionality in other modules, too, so some consistency would be nice.
        
          
                astroquery/heasarc/core.py
              
                Outdated
          
        
      | return self.query_region(pos, catalog=mission, spatial='cone', | ||
| get_query_payload=get_query_payload) | ||
|  | ||
| def query_by_column(self, catalog, params, *, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only comment here is about the naming, we don't have methods named this, but we do have query_criteria. How would you feel about using that, too?
Also, I would suggest to use something else then 'params', maybe constraint, or filter, or condition, or something even more descriptive?
cc @keflavich to have multiple eyes/opinions on the API naming choices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great to see another archive supporting this type of query, it's incredibly useful in Vizier:
https://astroquery.readthedocs.io/en/latest/vizier/vizier.html#specifying-keywords-output-columns-and-constraints-on-columns
I favor query_constraints because it's a pre-existing nomenclature that astroquery users may be accustomed to.
Instead of params, how about column_filters, again by analogy to Vizier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting query_criteria as it's used in more modules, but OTOH I admit it's kind of confusing naming. btw, here is the quick grep of what we have right now, including this PR:
   1 query_aliases
   1 query_apex_quicklooks
   7 query_async
   1 query_bibcode
   1 query_bibobj
   1 query_by_column
   1 query_catalog
   1 query_constraints_async
   3 query_criteria
   4 query_criteria_async
   1 query_criteria_count
   1 query_cross_id
   1 query_cross_id_async
   1 query_crossid_async
   1 query_hierarchy
   1 query_hips
   1 query_hsa_tap
   1 query_hsc_matchid_async
   1 query_ida_tap
   1 query_ids_catalogs
   1 query_ids_maps
   1 query_ids_spectra
   1 query_instrument
   4 query_lines_async
   1 query_loc
   1 query_main
   1 query_metadata
   1 query_mission_cols
   1 query_mission_list
   1 query_molecule
   1 query_name_async
  10 query_object
  13 query_object_async
   1 query_object_catalogs
   1 query_object_count
   1 query_object_maps
   1 query_object_spectra
   1 query_objectids
   1 query_objects
   1 query_objects_async
   1 query_observations
   1 query_photoobj_async
   1 query_planet
   1 query_raw
   1 query_refcode
   1 query_refcode_async
  12 query_region
  16 query_region_async
   1 query_region_catalogs
   1 query_region_count
   1 query_region_iau
   1 query_region_iau_async
   1 query_region_maps
   1 query_region_sia
   1 query_region_spectra
   2 query_sia
   1 query_simple
   1 query_specobj_async
   2 query_sql_async
   1 query_ssa
   1 query_sso
   1 query_surveys
   6 query_tap
   2 query_target
   1 query_with_wcs
   1 query_with_wcs_async
   1 query_xsa_tap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will switch to query_constraints and column_filters given the precedence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing how it is done in the Vizier module, I am tempted to make the column constraint in query_region and remove the extra function. Apologies for not seeing the precedence before. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The column filters are now part of query_region (9366c3c). I find it to be cleaner than my original implementation.
| return self.tap.search( | ||
| query, language='ADQL', maxrec=maxrec, uploads=uploads) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick as it doesn't matter really, but just a FYI: we allow long (120) lines here, no need to do these kind of line breaks
        
          
                astroquery/heasarc/core.py
              
                Outdated
          
        
      | # if maxrec is more than the server limit, we set a higher limit | ||
| if maxrec is not None and maxrec > 100000: | ||
| adql = adql.replace('SELECT ', f'SELECT TOP {maxrec*4} ') | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain this logic a little bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is trying to address #3387, but looking at it now, it is only a narrow fix. I will need to check with the backend team for this. It looks to me it could be a backend issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this part. I may followup with a separate PR once I dig more.
| return cols | ||
|  | ||
| def query_tap(self, query, *, maxrec=None): | ||
| def query_tap(self, query, *, maxrec=None, uploads=None): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think propagating the table uploads could be super useful for the other methods calling query_tap internally, so please consider adding it to query_region and the new query_by_column, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uploads make sense only if the user writes a TAP query because they need to reference the uploaded columns in the query. I am not sure it can be done in a generic way in query_region or query_by_column
…te tests and docs accordingly
eb052f6    to
    9366c3c      
    Compare
  
    
This PR adds several enhancements that have been added to include feedback from users (including reported issues; e.g #3387, #3321) since the last refactor to use the VO backend. The changes include:
query_by_columnto allow querying of different catalog columns. The user passes a dict that is parsed into a TAP WHERE statements.query_tap.download_dataBecause
query_by_columnshares code with the existingquery_region, the shared part has now been moved to an internal function_query_execute, which is called by bothquery_by_columnandquery_region.